Hierarchical Storage Management
Using the IDS/SAM Distributed Data Storage System
Introduction
The Integrated Data Server(TM) (IDS 1000) and Storage and
Archive Manager File System (TM) (SAM-FS) provide complete
control over data storage costs and vulnerability. IDS/SAM is
designed to provide the ability to manage storage in a cost-
effective manner without sacrificing data integrity. The
hierarchical storage capabilities of IDS/SAM allow media
resources to be extended as needed. Short-term and often-
referenced data can be stored on-line on disk, moderate-term data
can be cost-effectively stored to tape, and long-term data can be
stored either on tape or optical disk.
The Differences Between Archiving and Backing Up
Once a file has been archived, its alternate copies exist for the life
of the file and need not be stored on any other medium including
on-line storage. The data of an archived file is, in general,
immediately available through the archive mechanism of SAM-
FS. This is because off-line archive storage is considered an
addressable extension of the primary on-line disk storage.
Conversely, backup systems only make a snapshot of the current
state of the file system. Recovery of a file (usually due to loss)
involves an extraction process which copies the file from the
backup media to on-line storage. Backup procedures are still
required for an archive system, however, rather than copying data
(the data space), only the structure of the file system (the name
space) need be copied in the traditional manner.
Raid's Role in Storage Management
Today's disk array technology (RAID) offers reliable and
voluminous storage capability. Inherent reliability of this
technology is due to the general immunity of data loss from the
failure of any one component within the disk array. However,
RAID does not offer the storage variety, and thus the cost control
flexibility that HSM offers. And RAID offers little protection
against disasters. Even with RAID, alternate storage either in the
form of backups or archives is required.
With the IDS/SAM system, the administrator (and optionally the
end user) is given complete control over the storage of data, yet
remains completely removed from the details of where the data is
physically stored.
Automated Storage Management
IDS/SAM is designed to be completely automatic if desired. File
migration and on-line storage monitoring activities can be
performed automatically, using the storage monitor program.
These management services are actually comprised of two
entities; the necessary support functions within SAM-FS, and the
provided storage monitor which orchestrates the operation. The
storage monitor is started
at boot time and performs the following basic tasks:
- Archives files that need to be archived
- Releases the on-line disk space of archived files
- Deletes files with expired life spans
Thresholds
The selection criteria for the above operations are configurable
via the extended attributes of SAM-FS. These attributes are read
at the time the storage monitor is initialized. Both high and low
thresholds are used to control the availability of on-line storage.
Once storage usage is above the high water mark (e.g. 60%), the
storage monitor will attempt to reduce on-line storage to below
the low water mark (e.g. 40%). Initial disk thresholds are
specified in the master configuration file and are changeable at
anytime through the operator display utility (idsou).
Recall timer
The storage monitor is designed to perform the above tasks at
given intervals. A recall time determines the elapsed time between
the times when the storage monitor is actively processing files.
During each recall interval, the storage monitor will check the on-
line storage usage to determine if disk space must be freed. The
storage monitor first attempts to free disk space by releasing the
disk space of archived files that are on-line. If usage has not fallen
below the low water mark, the storage monitor will begin to
archive files so their space can be released as well. Upon
completion of this phase, two additional timers are checked to
determine if periodic archive or reaper operations need to be
performed. Periodic archiving does not release disk space after
archiving data, while the reaper deletes files with expired life
spans.
In an environment with robotically-controlled media, the IDS can
truly manage both the on-line and the near-line storage in a
"lights out" autonomous manner.
File Attributes
To implement many of the enhanced features of the system,
SAM-FS files have a considerable number of additional attributes
that can be associated with the file. Figure 1 summarizes the file
attributes available under SAM-FS. Each of these attributes exists
as part of the file's inode and are used either by the file system or
by the various resource management programs (e.g. the storage
monitor). These attributes are described in more detail in their
appropriate sections.
UNIX attributes
File size Length of the file.
Group Group the file is identified with.
User Owner of the file.
Access mode Controls read, write and execution access.
SAM-FS attributes:
Life span Life span of the file.
Cycles Controls the creation of a cycle with each new
version of the file.
Cycle life span Life span of cycles for the file.
Cycle limit Maximum number of cycles permitted.
Shadow Controls shadow writing for the file.
Direct access Controls archive access.
Nodrop Controls the releasing of on-line disk storage
assigned to the file by disk space monitor.
Media Residency Controls the media formats used for archiving
the file.
Life spans
The user is given complete control over the storage of data. The
user can specify the life span of the file, and can determine how
many archive copies of the file must exist and on what media type
it must reside. These parameters provide input on a file basis to
the storage monitor which is responsible for managing on-line and
archive storage.
The life span determines the time-period of existence for a file
within the IDS. Once a file's life span has expired, the file and
any data associated with the file can be destroyed. For on-line
storage this means removal of the file's control structures (inodes)
and any data blocks assigned. For off-line storage, the storage
becomes inactive and thus will eventually be
recycled. The life
span of any cycle associated with the file can be independently
controlled, because it is possible that the retention requirements of
a file's cycle would be different than the primary copy.
Life spans give the user complete control over the retention
requirements of the data, rather than simply using access time as
the determining factor for deciding whether a file should be
deleted or retained. Most archiving systems assume that if the
data has not been accessed over some period of time, the data is
no longer needed and can be removed. An illustrative example of
the reverse is the storage of engineering documentation of aircraft
or the retention requirements of financial records. The
accessibility requirements of the engineering data is the life of the
aircraft, say 50 years. The financial records may only be needed
for one year. In each case the data may not be accessed, but the
requirement to retain the information is obvious.
Archive Copies
To control vulnerability, the user is allowed to specify up to four
archive copies of the data. For critical long-term data, the user
may require that two copies of the data exist on optical disk. This
flexibility allows the user to evaluate storage costs verses the
value of the data. Tape may be a less expensive medium, but the
reliable storage life of tape is considerably shorter than the more
expensive optical disk medium.
Direct Access and Staging
To facilitate the efficient use of on-line storage and provide quick
access to near-line data, a file can be designated as "direct
access." Direct access to a file while near-line allows it to be read
without requiring that it be first staged on-line. This allows large
databases to be efficiently accessed while near-line. Obviously,
this only works for files that are being read. If the file is opened
for read-write access, the file will be staged on-line before it can
be read.
Consider the situation of a database archived to optical disk. The
retrieval of a single record could be accomplished in much less
time if the record could be read directly from archive storage
without first staging the database to disk. For optical disk this can
be accomplished at near magnetic disk speeds. Once the media
has been mounted (5-10 seconds for a jukebox) and the disk read
(1-2 seconds for label processing by the device scanner, etc.) the
archive file can be randomly positioned and the data read directly.
Pre-staging
For applications which need to access large portions of the data,
the file can be pre-staged to disk. Although not a requirement on
the part of the user, pre-staging may provide improved usage of
device resources by allowing stage requests for files resident on
the same archive media to be batched together.
Cycles
A SAM-FS file can have up to 65,535 cycles, each representing,
in succession, some previous image of the file's data content.
Each time a file is opened for write access (does not include read-
write access) a new file is created, and the old file becomes the
most recent cycle of the file. A user-selectable limit is imposed,
limiting the number of cycles a given file can have. Once this
limit is exceeded the oldest cycle is deleted. Cycles are referenced
and can be read using the
UNIX percent syntax.
Cycles can be deleted using the rm
command. For example,
rm abc%5
.
When a cycle is deleted all older cycles are also deleted.
Likewise, cycles can be renamed to files much like normal files
can be renamed using the mv
command
(e.g. mv abc%4 abc4
).
When a cycle is renamed, as in this example abc4
is created, all
cycles older than abc%4
now become the cycles of the newly-
created file abc4
. To facilitate easy removal of unwanted
cycles, a purge
command is provided which deletes the cycles of
the supplied list of files that are older than a certain level.
Media Management
Storage format
All archive files are written in tar
format. The tar
format was
selected because of its general portability among UNIX as well as
other systems. Any format can be used. All that is required is that
the file's data be sequentially written and the offset to the
beginning of the data be known. For example, a direct archive
file can simply be copied to the media using cp
. The data is, of
course, sequentially written and the file offset would be zero.
Grouping
When a file is archived by the storage monitor, it is either written
directly to the archive media, or grouped with other files before it
is written. The cross-over point is determined by the file's size
and is one of the supplied parameters. This allows large files to be
archived as stand-alone entities that are not only restorable
through the retrieval mechanisms of SAM-FS, but also easily read
and processed on other systems as well. To expedite the archive
process, files which require more than one archive image are
copied in parallel during archiving.
Device Scanner and Robotics Manager
The device scanner is an essential component for effective
management of removable media. All peripheral devices are
monitored by the scanner as is the media request queue. When the
scanner observes the presence of requested media on a particular
device (depending upon access restrictions that are currently
imposed) the device scanner will connect the device with the open
file that is requesting the media. At this point the job requesting
the data is awakened and processing continues. The device
remains assigned until the file is closed. If in a multi-file media
volume, SAM-FS will properly position the media to the correct
file.
Robotic devices are considered to be
family set devices and
the drives contained within the jukebox are its members. The
robotics manager monitors the request queue, and schedules the
automatic mounting of requested media including the issuing of
robotic motion-control instructions to the device. Of course, once
the media is present on one of the member drives, the device
scanner detects it and completes the connection between user and
device. Upon completion, the device unloads the media and re-
files it in either the jukebox or the media magazine.
Summary
A principal design objective for the Storage and Archive Manager
File System(TM) is to provide a suitable framework for the cost-
effective control and safeguarding of distributed data.
The support services and features built in to SAM-FS make it
ideal for protecting either short- or long-term data.
SAM-FS is available now
For more information on how SAM-FS and the Integrated Data
Server (IDS) can meet your network data storage needs,
contact LSC at (612) 482-4535. Or e-mail for additional Tech
Notes to inform@lsci.com.
Return to Tech Notes Menu.
Return to LSCI Home Page.
(C)1994, LSC, Inc. All rights reserved.
Integrated Data Server (IDS), Storage and Archiving
Manager (SAM-FS) and Fast File Recovery System are
trademarks of LSC, Inc. All other trademarks are the property of
their respective owners.